Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Multi-dimensional text clustering with user behavior characteristics
LI Wanying, HUANG Ruizhang, DING Zhiyuan, CHEN Yanping, XU Liyang
Journal of Computer Applications    2018, 38 (11): 3127-3131.   DOI: 10.11772/j.issn.1001-9081.2018041357
Abstract912)      PDF (970KB)(484)       Save
Traditional multi-dimensional text clustering generally extracts features from text contents, but seldom considers the interaction information between users and text data, such as likes, forwards, reviews, concerns, references, etc. Moreover, the traditional multi-dimension text clustering mainly integrates linearly multiple spatial dimensions and fails to consider the relationship between attributes in each dimension. In order to effectively use text-related user behavior information, a Multi-dimensional Text Clustering with User Behavior Characteristics (MTCUBC) was proposed. According to the principle that the similarity between texts should be consistent in different spaces, the similarity was adjusted by using the user behavior information as the constraints of the text content clustering, and then the distance between the texts was improved by the metric learning method, so that the clustering effect was improved. Extensive experiments conduct and verify that the proposed MTCUBC model is effective, and the results present obvious advantages in high-dimensional sparse data compared to linearly combined multi-dimensional clustering.
Reference | Related Articles | Metrics
Multi-source text topic mining model based on Dirichlet multinomial allocation model
XU Liyang, HUANG Ruizhang, CHEN Yanping, QIAN Zhisen, LI Wanying
Journal of Computer Applications    2018, 38 (11): 3094-3099.   DOI: 10.11772/j.issn.1001-9081.2018041359
Abstract420)      PDF (1100KB)(461)       Save
With the rapid increase of text data sources, topic mining for multi-source text data becomes the research focus of text mining. Since the traditional topic model is mainly oriented to single-source, there are many limitations to directly apply to multi-source. Therefore, a topic model for multi-source based on Dirichlet Multinomial Allocation model (DMA) was proposed considering the difference between sources of topic word-distribution and the nonparametric clustering quality of DMA, namely MSDMA (Multi-Source Dirichlet Multinomial Allocation). The main contributions of the proposed model are as follows:1) it takes into account the characteristics of each source itself when modeling the topic, and can learn the source-specific word distributions of topic k; 2) it can improve the topic discovery performance of high noise and low information through knowledge sharing; 3) it can automatically learn the number of topics within each source without the need for human pre-given. The experimental results in the simulated data set and two real datasets indicate that the proposed model can extract topic information more effectively and efficiently than the state-of-the-art topic models.
Reference | Related Articles | Metrics
Interval-value attribute reduction algorithm for meteorological observation data based on genetic algorithm
ZHENG Zhongren, CHENG Yong, WANG Jun, ZHONG Shuiming, XU Liya
Journal of Computer Applications    2017, 37 (9): 2678-2683.   DOI: 10.11772/j.issn.1001-9081.2017.09.2678
Abstract501)      PDF (1007KB)(471)       Save
Aiming at the problems that the purpose of the meteorological observation data acquisition is weak, the redundancy of data is high, and the number of single values in the observation data interval is large, the precision of equivalence partitioning is low, an attribute reduction algorithm for Meteorological Observation data Interval-value based on Genetic Algorithm (MOIvGA) was proposed. Firstly, by improving the similarity degree of interval value, the proposed algorithm could be suitable for both single value equivalence relation judgment and interval value similarity analysis. Secondly, the convergence of the algorithm was improved by the improved adaptive genetic algorithm. Finally, the simulation experiments show that the number of the iterations of the proposed algorithm is reduced by 22, compared with the method which operated AGAv (Adaptive Genetic Attribute reduction) algorithm to solve the optimal value. In the time interval of 1 hour precipitation classification, the average classification accuracy of the MOIvGA (λ-Reduction in Interval-valued decision table based on Dependence) algorithm is 6.3% higher than that of RIvD algorithm; the accuracy of no rain forecasting is increased by 7.13%; at the same time, the classification accuracy can be significantly impoved by the attribute subset received by operating the MOIvGA algorithm. Therefore, the MOIvGA algorithm can increase the convergence rate and the classification accuracy in the analysis of interval value meteorological observation data.
Reference | Related Articles | Metrics
Distributed fault detection for wireless sensor network based on cumulative sum control chart
LIU Qiuyue, CHENG Yong, WANG Jun, ZHONG Shuiming, XU Liya
Journal of Computer Applications    2016, 36 (11): 3016-3020.   DOI: 10.11772/j.issn.1001-9081.2016.11.3016
Abstract650)      PDF (908KB)(434)       Save
With the stringent resources and distributed nature in wireless sensor networks, fault diagnosis of sensor nodes faces great challenges. In order to solve the problem that the existing approaches of diagnosing sensor networks have high false alarm ratio and considerable computation redundancy on nodes, a new fault detection mechanism based on Cumulative Sum Chart (CUSUM) and neighbor-coordination was proposed. Firstly, the historical data on a single node were analyzed by CUSUM to improve the sensitivity of fault diagnosis and locate the change point. Then, the fault nodes were detected though judging the status of nodes by the data exchange between neighbor nodes. The experimental results show that the detection accuracy is over 97.7% and the false alarm ratio is below 2% when the sensor fault probability in wireless sensor networks is up to 35%. Hence, the proposed algorithm has a high detection accuracy and low false alarm ratio even in the conditions of high fault probabilities and reduces the influence of sensor fault probability clearly.
Reference | Related Articles | Metrics